Efficient Code Optimization Technique for Itanium2 Cache System and Scientific Computing

نویسندگان

  • William JALBY
  • Christophe LEMUET
چکیده

To keep up with a large degree of ILP, Itanium2 L2 cache system uses a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of this cache system on memory instruction scheduling. We demonstrate that for scientific codes, “memory access vectorization” allows to generate very efficient code (up to the maximum of 4 loads per cycle). The impact of such “vectorization” on register pressure is analyzed: various register allocation schemes are proposed and evaluated.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Optimizing Performance on Modern HPC Systems: Learning From Simple Kernel Benchmarks

We discuss basic optimization and parallelization strategies for current cache-based microprocessors (Intel Itanium2, Intel Netburst and AMD64 variants) in single-CPU and shared memory environments. Using selected kernel benchmarks representing data intensive applications we focus on the effective bandwidths attainable, which is still suboptimal using current compilers. We stress the need for a...

متن کامل

Combinatorial Problems in Compiler Optimization

Several important compiler optimizations such as instruction scheduling and register allocation are fundamentally hard and are usually solved using heuristics or approximate solutions. In contrast, this thesis examines optimal solutions to three combinatorial problems in compiler optimization. The first problem addresses instruction scheduling for clustered architectures, popular in embedded sy...

متن کامل

An efficient memory operations optimization technique for vector loops on Itanium 2 processors

To keep up with a large degree of instruction level parallelism (ILP), the Itanium 2 cache systems use a complex organization scheme: load/store queues, banking and interleaving. In this paper, we study the impact of these cache systems on memory instructions scheduling. We demonstrate that, if no care is taken at compile time, the non-precise memory disambiguation mechanism and the banking str...

متن کامل

SECTORED CACHE PERFORMANCE EVALUATION: A case study on the KSR-1 data subcache

Sectoring is a cache design and management technique that is re-emerging as cache sizes get larger and computer designers strive to exploit the possible gains from using large block (line) sizes due to spatial locality. Sectoring allows for a small tag array size to suffice retaining address tags only for the large blocks, but still avoids huge miss penalties by utilizing a smaller transfer siz...

متن کامل

A Stable and Efficient Loop Tiling Algorithm

Loop tiling is an effective optimizing transformation to boost the memory performance of a program, especially for dense matrix scientific computations. The magnitude and stability of the achieved performance improvements is heavily dependent on the appropriate selection of tile sizes. Many existing tile selection algorithms try to find tile sizes which eliminate self-interference cache conflic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007